Course: Computational Thinking for Governance Analytics

Prof. José Manuel Magallanes, PhD


Session 7: Networks as Governance in R

Plan for this session:

  1. Importing Data and Building a Network
  2. Exploring Network
  3. Exporting the Network

The network we are going to build is based on a relationship studied in this paper:

In that paper, the researcher builds a matrix of relationships like this:

1. Importing Data

The data was not available from the author’s website, so the matrix you see above was copied and pasted to Excel:

# opening excel
library(rio)

linkAdjMx='https://github.com/EvansDataScience/CTforGA_Networks/raw/main/dataFigueroa.xlsx'

adjacency=import(linkAdjMx,which = 1)

This data is organized as an adjacency matrix. It should be squared:

dim(adjacency)
## [1] 37 38

Let’s take a look:

head(adjacency)
##          Names Romero Grana Miro Quesada Moreyra Fort De La Puente Wiese
## 1       Romero      0     1            1       1    1            1     0
## 2        Grana      1     0            1       0    1            1     1
## 3 Miro Quesada      1     1            0       0    1            1     1
## 4      Moreyra      1     0            0       0    1            1     1
## 5         Fort      1     1            1       1    0            1     0
## 6 De La Puente      1     1            1       1    1            0     1
##   Onrubia Brescia Nicolini Montero Picaso Bentin Benavides Bustamante
## 1       1       1        1       0      0      1         1          1
## 2       0       0        0       1      0      0         1          1
## 3       0       0        0       1      0      0         1          1
## 4       1       1        0       1      1      1         0          1
## 5       1       1        1       0      1      1         1          1
## 6       0       0        0       1      0      0         1          1
##   Woodman Pollit Raffo Piazza Berckemeyer Llosa Barber Beoutis Ledesma
## 1              1     1      1           1            1               0
## 2              0     0      1           0            0               1
## 3              0     0      1           0            0               1
## 4              0     1      0           1            0               0
## 5              1     1      1           0            1               1
## 6              0     0      1           1            0               1
##   Rizo Patron Montori Sotomayor Cilloniz Ferreyros Michell Wong Lu
## 1           1       1         0        0         0       0       0
## 2           0       0         0        0         0       1       0
## 3           0       0         0        0         0       1       0
## 4           1       1         1        0         0       0       0
## 5           0       0         1        1         0       0       0
## 6           1       0         1        0         0       0       0
##   Batievsky Spack Matos Escalada Galsky Lucioni Rodriguez Rodriguez Custer
## 1               0              0      0       0                   0      0
## 2               0              0      0       0                   0      0
## 3               0              0      0       0                   0      0
## 4               0              0      0       0                   0      0
## 5               0              0      0       0                   0      0
## 6               0              0      0       0                   0      0
##   Ikeda Cogorno Arias Davila
## 1     0       0            0
## 2     0       0            0
## 3     0       0            0
## 4     0       0            0
## 5     0       0            0
## 6     0       0            0

Let’s move the column Names as the row names, then we will get an squared matrix:

row.names(adjacency)=adjacency$Names
adjacency$Names=NULL

# then
head(adjacency)
##              Romero Grana Miro Quesada Moreyra Fort De La Puente Wiese Onrubia
## Romero            0     1            1       1    1            1     0       1
## Grana             1     0            1       0    1            1     1       0
## Miro Quesada      1     1            0       0    1            1     1       0
## Moreyra           1     0            0       0    1            1     1       1
## Fort              1     1            1       1    0            1     0       1
## De La Puente      1     1            1       1    1            0     1       0
##              Brescia Nicolini Montero Picaso Bentin Benavides Bustamante
## Romero             1        1       0      0      1         1          1
## Grana              0        0       1      0      0         1          1
## Miro Quesada       0        0       1      0      0         1          1
## Moreyra            1        0       1      1      1         0          1
## Fort               1        1       0      1      1         1          1
## De La Puente       0        0       1      0      0         1          1
##              Woodman Pollit Raffo Piazza Berckemeyer Llosa Barber
## Romero                    1     1      1           1            1
## Grana                     0     0      1           0            0
## Miro Quesada              0     0      1           0            0
## Moreyra                   0     1      0           1            0
## Fort                      1     1      1           0            1
## De La Puente              0     0      1           1            0
##              Beoutis Ledesma Rizo Patron Montori Sotomayor Cilloniz Ferreyros
## Romero                     0           1       1         0        0         0
## Grana                      1           0       0         0        0         0
## Miro Quesada               1           0       0         0        0         0
## Moreyra                    0           1       1         1        0         0
## Fort                       1           0       0         1        1         0
## De La Puente               1           1       0         1        0         0
##              Michell Wong Lu Batievsky Spack Matos Escalada Galsky Lucioni
## Romero             0       0               0              0      0       0
## Grana              1       0               0              0      0       0
## Miro Quesada       1       0               0              0      0       0
## Moreyra            0       0               0              0      0       0
## Fort               0       0               0              0      0       0
## De La Puente       0       0               0              0      0       0
##              Rodriguez Rodriguez Custer Ikeda Cogorno Arias Davila
## Romero                         0      0     0       0            0
## Grana                          0      0     0       0            0
## Miro Quesada                   0      0     0       0            0
## Moreyra                        0      0     0       0            0
## Fort                           0      0     0       0            0
## De La Puente                   0      0     0       0            0

This matrix is saved as a data frame has now to be converted into a matrix.

adjacency=as.matrix(adjacency) # This coerces the object into a matrix, just in case

From this kind of structure (the adjacency matrix), we can easily create a network via Igraph:

library(igraph)
EliteNet=graph.adjacency(adjacency,mode="undirected",weighted=NULL)
# see it here
EliteNet
## IGRAPH a57fd12 UN-- 37 135 -- 
## + attr: name (v/c)
## + edges from a57fd12 (vertex names):
##  [1] Romero--Grana          Romero--Miro Quesada   Romero--Moreyra       
##  [4] Romero--Fort           Romero--De La Puente   Romero--Onrubia       
##  [7] Romero--Brescia        Romero--Nicolini       Romero--Bentin        
## [10] Romero--Benavides      Romero--Bustamante     Romero--Woodman Pollit
## [13] Romero--Raffo          Romero--Piazza         Romero--Berckemeyer   
## [16] Romero--Llosa Barber   Romero--Rizo Patron    Romero--Montori       
## [19] Grana --Miro Quesada   Grana --Fort           Grana --De La Puente  
## [22] Grana --Wiese          Grana --Montero        Grana --Benavides     
## + ... omitted several edges

A network is composed of nodes (aka vertices) and edges that connect them. You can know how many you have of each like this:

vcount(EliteNet) #count of nodes
## [1] 37
ecount(EliteNet) #count of edges
## [1] 135

You can take a look at how this network looks like:

plot.igraph(EliteNet,
            vertex.color = 'yellow',
            edge.color='lightblue')

So far we only have nodes and their links. Let’s bring som information about the nodes:

# The adjacency matrix did not include the nodes attributes.
attributes=import(linkAdjMx,which = 2)
head(attributes)
##          Nodes multinational
## 1       Romero             1
## 2        Grana             1
## 3 Miro Quesada             1
## 4      Moreyra             1
## 5         Fort             1
## 6 De La Puente             1

Igraph can add an attribute easily. Let’s proceed with the change:

EliteNet=set_vertex_attr(EliteNet,"multi",value=attributes$multinational)

#then
EliteNet
## IGRAPH a57fd12 UN-- 37 135 -- 
## + attr: name (v/c), multi (v/n)
## + edges from a57fd12 (vertex names):
##  [1] Romero--Grana          Romero--Miro Quesada   Romero--Moreyra       
##  [4] Romero--Fort           Romero--De La Puente   Romero--Onrubia       
##  [7] Romero--Brescia        Romero--Nicolini       Romero--Bentin        
## [10] Romero--Benavides      Romero--Bustamante     Romero--Woodman Pollit
## [13] Romero--Raffo          Romero--Piazza         Romero--Berckemeyer   
## [16] Romero--Llosa Barber   Romero--Rizo Patron    Romero--Montori       
## [19] Grana --Miro Quesada   Grana --Fort           Grana --De La Puente  
## [22] Grana --Wiese          Grana --Montero        Grana --Benavides     
## + ... omitted several edges

It should have worked:

vertex_attr_names(EliteNet) 
## [1] "name"  "multi"

Before going further, it is good to know if our network is connected:

is_connected(EliteNet)
## [1] FALSE

So we have these people in components, how many?

components(EliteNet)$no
## [1] 8

What nodes are in each component?:

groups(components(EliteNet))
## $`1`
##  [1] "Romero"          "Grana"           "Miro Quesada"    "Moreyra"        
##  [5] "Fort"            "De La Puente"    "Wiese"           "Onrubia"        
##  [9] "Brescia"         "Nicolini"        "Montero"         "Picaso"         
## [13] "Bentin"          "Benavides"       "Bustamante"      "Woodman Pollit" 
## [17] "Raffo"           "Piazza"          "Berckemeyer"     "Llosa Barber"   
## [21] "Beoutis Ledesma" "Rizo Patron"     "Montori"         "Sotomayor"      
## [25] "Cilloniz"        "Ferreyros"       "Michell"         "Wong Lu"        
## 
## $`2`
## [1] "Batievsky Spack" "Matos Escalada"  "Galsky"         
## 
## $`3`
## [1] "Lucioni"
## 
## $`4`
## [1] "Rodriguez Rodriguez"
## 
## $`5`
## [1] "Custer"
## 
## $`6`
## [1] "Ikeda"
## 
## $`7`
## [1] "Cogorno"
## 
## $`8`
## [1] "Arias Davila"

Let me add the component as an attribute:

component=components(EliteNet)$membership
EliteNet=set_vertex_attr(EliteNet,"component",value=component)
#then
EliteNet
## IGRAPH a57fd12 UN-- 37 135 -- 
## + attr: name (v/c), multi (v/n), component (v/n)
## + edges from a57fd12 (vertex names):
##  [1] Romero--Grana          Romero--Miro Quesada   Romero--Moreyra       
##  [4] Romero--Fort           Romero--De La Puente   Romero--Onrubia       
##  [7] Romero--Brescia        Romero--Nicolini       Romero--Bentin        
## [10] Romero--Benavides      Romero--Bustamante     Romero--Woodman Pollit
## [13] Romero--Raffo          Romero--Piazza         Romero--Berckemeyer   
## [16] Romero--Llosa Barber   Romero--Rizo Patron    Romero--Montori       
## [19] Grana --Miro Quesada   Grana --Fort           Grana --De La Puente  
## [22] Grana --Wiese          Grana --Montero        Grana --Benavides     
## + ... omitted several edges

A visual representation follows:

Labels=component
numberOfClasses = length(unique(Labels)) 

#preparing color
library(RColorBrewer)
colorForScale='Set2'
colors = brewer.pal(numberOfClasses, colorForScale)

# plotting
plot.igraph(EliteNet,
             vertex.color = colors[Labels],
             edge.color='lightblue')

As we do not have ONE connected network but several components, we can pay attention to the Giant Component (component with max nodes), follow these steps:

  1. Get the sizes of each component:
(Sizes=components(EliteNet)$csize)
## [1] 28  3  1  1  1  1  1  1
  1. Get the subnet with the largest component:
# this is a subnet
EliteNet_giant=induced.subgraph(EliteNet, which(Labels == which.max(Sizes)))

Let’s take a look at the Giant Component:

plot.igraph(EliteNet_giant)

Basic summary:

summary(EliteNet_giant)
## IGRAPH 2f79726 UN-- 28 133 -- 
## + attr: name (v/c), multi (v/n), component (v/n)

We will use the giant component as our network to be explored. ____

2. Exploring the NETWORK

Exploring the Network as a whole

  • Density: from 0 to 1, where 1 makes it a ‘complete’ network: there is a link between every pair of nodes.
graph.density(EliteNet_giant)
## [1] 0.3518519
  • Diameter: worst case escenario for number of steps for someone to contact another one (only for connected component).
diameter(EliteNet_giant)
## [1] 4
  • Local clustering coefficient of a node is away to measure the level of connectivity its neighbors. If all its neighbors are connected among one another you get 1; if none of them is connected you get zero. Then, the average clustering coefficient tells you the average of those values.
# we need some help beyond Igraph:
transitivity(EliteNet_giant,type = 'average')
## [1] 0.6537019
  • Shortest path (average): it gets the average of every shortest path among the nodes in the network. A shorter path is the shortest walk from one node to another.
average.path.length(EliteNet_giant)
## [1] 1.740741

Random networks have small shortest path and small clustering coefficient…Is this the case?. The high clustering coefficient would suggest a small world, as most nodes are not neighbors of one another, but most nodes can be reached from every other in few steps.

  • Transitivity: How probable is that two business men with a common business friend, are also friends.
transitivity(EliteNet_giant)
## [1] 0.5829694
  • Assortativity (degree): it is a measure to see if nodes are connecting to other nodes similar in degree. Closer to 1 means higher assortativity, closer to -1 diassortativity; while 0 is no assortitivity.
assortativity_degree(EliteNet_giant)
## [1] -0.1208671

You can also compute assortativity using an attribute of interest:

attrNet=V(EliteNet_giant)$multi
assortativity(EliteNet_giant,attrNet)
## [1] -0.07258065

Coloring by attribute:

LabelsColor=attrNet+1
colors=c('lightblue','magenta')
plot.igraph(EliteNet_giant,
       vertex.color = colors[LabelsColor])

Exploration of network communities

A clique can be understood a community of nodes where all of them are connected to one another.

  • How many cliques can be found in this network?
length(cliques(EliteNet_giant))
## [1] 1074

If a clique in the network can not be bigger is you add another node, then you have a maximal clique.

  • How many maximal cliques are there in this network?
# How many cliques
count_max_cliques(EliteNet_giant)
## [1] 28

You can find the size of the maximum cliques:

clique_num(EliteNet_giant)
## [1] 8

You can see each maximum clique like this:

max_cliques(EliteNet_giant,min=8)
## [[1]]
## + 8/28 vertices, named, from 2f79726:
## [1] Onrubia        Romero         Raffo          Bentin         Fort          
## [6] Llosa Barber   Woodman Pollit Nicolini      
## 
## [[2]]
## + 8/28 vertices, named, from 2f79726:
## [1] Onrubia     Romero      Raffo       Bentin      Berckemeyer Montori    
## [7] Brescia     Moreyra    
## 
## [[3]]
## + 8/28 vertices, named, from 2f79726:
## [1] Benavides    Romero       Piazza       Bustamante   De La Puente
## [6] Fort         Miro Quesada Grana

If a network presents cliques makes you suspect that there can be communities.

This is a huge field of research, let me me show you one of the algorithms known as the Louvain method.

communities=cluster_louvain(EliteNet_giant)
(partition=membership(communities))
##          Romero           Grana    Miro Quesada         Moreyra            Fort 
##               1               2               2               3               2 
##    De La Puente           Wiese         Onrubia         Brescia        Nicolini 
##               2               3               1               1               1 
##         Montero          Picaso          Bentin       Benavides      Bustamante 
##               2               3               1               2               2 
##  Woodman Pollit           Raffo          Piazza     Berckemeyer    Llosa Barber 
##               1               1               2               1               1 
## Beoutis Ledesma     Rizo Patron         Montori       Sotomayor        Cilloniz 
##               2               3               1               3               3 
##       Ferreyros         Michell         Wong Lu 
##               2               2               1

Now, use those values to make a plot to highlight the communities:

Labels=partition
numberOfClasses = length(unique(Labels)) 

library(RColorBrewer)
colorForScale='Set2'
colors = brewer.pal(numberOfClasses, colorForScale)

plot.igraph(EliteNet_giant,
             vertex.color = colors[Labels],
             edge.color='lightblue')

Let’s turn our attention to the nodes and their roles in the network.

Exploration of network actors

rounding=3
degr=round(degree(EliteNet_giant,,normalized=T),rounding)
close=round(closeness(EliteNet_giant,,normalized=T),rounding)
betw=round(betweenness(EliteNet_giant,,normalized=T),rounding)

DFCentrality=as.data.frame(cbind(degr,close,betw),stringsAsFactors = F)
names(DFCentrality)=c('Degree','Closeness','Betweenness')
DFCentrality$person=row.names(DFCentrality)
row.names(DFCentrality)=NULL
head(DFCentrality)
##   Degree Closeness Betweenness       person
## 1  0.667     0.750       0.102       Romero
## 2  0.407     0.614       0.043        Grana
## 3  0.407     0.614       0.043 Miro Quesada
## 4  0.556     0.675       0.066      Moreyra
## 5  0.704     0.771       0.155         Fort
## 6  0.519     0.659       0.039 De La Puente
library(ggplot2)
ggplot(DFCentrality, aes(x=Betweenness, y=Closeness)) + theme_classic()+
  scale_size(range = c(1, 25))  + geom_text(aes(label=person,color=Degree)) +
  scale_colour_gradient(low = "orange", high = "black")

The node with the highest degree could be considered a hub in the network:

DFCentrality[which.max(DFCentrality$Degree),]
##   Degree Closeness Betweenness person
## 5  0.704     0.771       0.155   Fort

We can plot the neighbors of the hub, its ego network:

  1. Determine the hub name:
#who
hub=DFCentrality[which.max(DFCentrality$Degree),]$person
  1. Determine the hub position:
#where (a character to numeric)
hubix=as.numeric(row.names(DFCentrality[which.max(DFCentrality$Degree),]))
  1. Request the ego network of the hub:
HubEgonets=make_ego_graph(EliteNet_giant, nodes=hubix)
# HubEgonets is a list, get the first one:
HubEgonet=HubEgonets[[1]]
  1. Just plot the ego you got:
egoSizes=rep(5,vcount(HubEgonet)) # sizes '5' for every node
egoSizes[hubix]=40  # size '40' for this one
V(HubEgonet)$size=egoSizes # saving sizes
plot.igraph(HubEgonet,
             vertex.color = 'yellow',
             edge.color='lightblue')

Can this network be disconnected? If so, we can compute the minimum number of nodes that must be removed to disconnect the network (create at least two components):

vertex_connectivity(EliteNet_giant)
## [1] 1

Who is the sole node with the power to break the network?

(cut=articulation_points(EliteNet_giant))
## + 1/28 vertex, named, from 2f79726:
## [1] Bentin

We can highlight the articulation node in the giant component:

cutix=which(V(EliteNet_giant)==cut)

allSizes=rep(10,vcount(EliteNet_giant))
allSizes[cutix]=40
V(EliteNet_giant)$size=allSizes # saving sizes
plot.igraph(EliteNet_giant,
             vertex.color = 'yellow',
             edge.color='lightblue',vertex.shape='sphere')

Exporting the network

write_graph(EliteNet, "EliteNetR.graphml", "graphml")